This tutorial was made in the context of a seminar about visual communication in the context of (potentially ambiguous) responses to negated statements ("she did not go to school" -> "yes"). What are the multimodal resources one can recruit to contextualize, transform, or even replace, what you mean in such responses against the lexicalized aspects of speech or sign? How would we analyze the presence and absence, and the degree to which, such multimodal resources are exploited? There are no single answers and solutions to such questions. But here is one particular operationalization of looking at multiple aspects of communicative movement, and extracting features to further work with them in your analysis. Importantly, this tutorial also underlines the necessity of sanity checking, by using dynamic visualizations which I think can be helpful for communicating your science too.
We will start overviewing the example data, which contain short responses to statements, and we overview the extracted motion timeseries. Then we extract head movement features and hand movement features. We follow up with
Future work (if I or anyone else who would like to contribute finds some time):
As always, please install the requirements to get all packages needed. Navigate to the folder containing the requirements.txt in your terminal (OS) or conda command prompt (windows) and run
Pouw, W. (2024). Wim Pouw's EnvisionBOX modules for social signal processing (Version 1.0.0) [Computer software]. https://github.com/WimPouw/envisionBOX_modulesWP
We are first going to identify the relevant folders that we have. We will be working with masked video data that is already processed with mediapipe's holistic tracking module, containing the face kinematics with head rotations, and body and hand kinematic data. Below we identify the relevant folders and also show an example video of the kind of kinematic data we work with,
import os #folder designating
import pandas as pd #data wrangling and data framing
from IPython.display import Video # videodisplay and writing
import numpy as np # numerical operations
import glob as glob # file handling
import matplotlib.pyplot as plt # plotting
#Load in the data that specifies each condition and related information to each trial video
meta = pd.read_csv("../meta/meta_dummy.csv")
#initialize the folder where the processed data is saved
processed_data_fol = '../processed_data/'
#raw_video_fol = '../raw_videos/' #we will not be working with raw videos as they contain identifiable information
#initialize the folder where masked video data is saved
masked_video_fol = '../masked_videos/reencoded/'
augemented_video_fol = '../augmented_videos/'
# for automated annotations
automated_annotation_fol = '../meta/automated_annotations/'
# get all videos
all_videos = glob.glob(masked_video_fol + '*.mp4')
# get all raw videos
all_raw_videos = glob.glob(raw_video_fol + '*.mp4')
# show the meta data
print(meta.head())
select_example = meta['trial_name'][6]
select_example2 = meta['trial_name'][5]
print('we are going to work with the following example: ' + select_example)
trial_name condition type 0 01_b_P16 dummy1 test 1 02_a_P8 dummy1 test 2 09_a_P7 dummy2 test 3 17_a_P15 dummy2 test 4 21_b_P12 dummy1 test we are going to work with the following example: 34_a_P12
We have 8 masked videos containing responses in German sign language DGS (Loos, Steinbach, Repp, in prep).
# show a video of the example in a small size
Video(masked_video_fol + select_example + '.mp4', embed=True, width=300)
# show a video of the example, concatenate example 1 and 2 to show the full video
Video(masked_video_fol + select_example2 + '.mp4', embed=True, width=300)
This video data has been set up in a way that might not necessarily be ideal for the computer vision methods we use to study movement. Consider a couple of things below in the videos concatenare below and above (for the full videos):
What this means is that the computer vision model will try to infer the bodies locations based on the available information about the body posture in the frame. Importantly, the video-based motion tracking models usually have a likelihood or "visibility" variable that provide you some information about whether the current estimate of some body keypoint is reliable or not. Since we are working with opportunistic rather than designed data here, we will therefore code into our calculations of the features that we want to base our measures on visible movements. However, if you design your experiment, a good rule of thumb is to have the whole body visible at all times during your recordings.
In the current data the participants also have to interact with the computer to go to the next trial. However, this requires the body to organize into a keypressing device, which means you get head movements that guide gaze and you will have hand movements. These movements are not relevant for our analysis, but we now have different movements mixed in with each other and we need to be careful not confusing these. Of course, all participants will need to do this, but as always, people do things differently. It will add noise to our measurements that we want to be informative about communicative movement dynamics. Ideally in these situations we find a way to exclude these movements, either by manual annoation, experiment design, or machine classification.
So some tips I have for motion tracking with videos:
all_videoexample = '../augmented_videos/rerendered/all_videos.mp4'
Video(all_videoexample, embed=True, width=300)
For this tutorial we use mediapipe holistic, which combines a face mesh with over 300 keypoints, and a body and head model.
We are using the Holistic Mediapipe face model. In the images folder, you can find the particular keypoints by zooming in on this image.
While Mediapipe does not give head rotations by default, we can extract them as exemplified on envisionBOX.
So what are these rotations valuable for next to position data? Consider that the head moves by rotating in several directions around a joint. We can identify three axes, x [horizontal], y [depth], z[vertical]. There are then three complementary rotations that we can track. One is the head raise position, called head pitch, important for tracking head nodding, which is rotation around the x-axis. Another is called yaw, which is important for tracking head shaking, which is rotation around the z-axis. Finally there is head tilt, called roll, which is a rotation around the y axis. Each rotation is expressed in angles (degrees in this case). How do you determine which direction is a negative versus positive direction? You use the right hand rule: Grab an axis, such your hand curl in the direction of the rotation, and then see what direction your thumb is directed at on the axis. Is that the positive or negative direction on that axis (remember, on the vertical z axis up is positive, on the horizontal x axis rightward is positive, and in the depth y dimension forward is positive).
Then we have the following keypoints too for tracking hand and body. What is nice about Mediapipe's body model is that it gives a measure that is provided in meters, this way we have a better understanding between experiments where the camera settings/distances might be different.
MTface = pd.read_csv(processed_data_fol + select_example + "_face.csv")
MTface.head()
| time | X_0 | Y_0 | Z_0 | X_1 | Y_1 | Z_1 | X_2 | Y_2 | Z_2 | ... | Z_475 | X_476 | Y_476 | Z_476 | X_477 | Y_477 | Z_477 | face_rotation_X | face_rotation_Y | face_rotation_Z | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.544024 | 0.436668 | -0.013681 | 0.550760 | 0.415926 | -0.024691 | 0.547840 | 0.422302 | -0.013349 | ... | 0.010433 | 0.565259 | 0.381678 | 0.010433 | 0.569400 | 0.387891 | 0.010433 | 11.892439 | 10.181866 | -0.244804 |
| 1 | 40.0 | 0.544468 | 0.436542 | -0.012898 | 0.551794 | 0.416177 | -0.024372 | 0.548903 | 0.422113 | -0.013019 | ... | 0.010095 | 0.565199 | 0.381798 | 0.010095 | 0.569612 | 0.388028 | 0.010095 | 10.916988 | 10.123332 | -0.148070 |
| 2 | 80.0 | 0.544588 | 0.436709 | -0.013179 | 0.551604 | 0.415111 | -0.024495 | 0.548733 | 0.421326 | -0.013134 | ... | 0.010164 | 0.565619 | 0.381799 | 0.010164 | 0.569572 | 0.387640 | 0.010164 | 11.697650 | 9.843294 | -0.211414 |
| 3 | 120.0 | 0.544919 | 0.436721 | -0.013239 | 0.551935 | 0.415350 | -0.024400 | 0.549028 | 0.421526 | -0.013097 | ... | 0.010428 | 0.565574 | 0.381823 | 0.010428 | 0.569504 | 0.387658 | 0.010428 | 12.091591 | 10.140566 | -0.216261 |
| 4 | 160.0 | 0.544830 | 0.436383 | -0.013353 | 0.551832 | 0.415057 | -0.024489 | 0.548952 | 0.421228 | -0.013183 | ... | 0.010390 | 0.565646 | 0.381876 | 0.010390 | 0.569494 | 0.387701 | 0.010390 | 12.200451 | 10.061555 | -0.186731 |
5 rows × 1438 columns
MTbody = pd.read_csv(processed_data_fol + select_example + "_body.csv")
MTbody.head()
| time | X_NOSE | Y_NOSE | Z_NOSE | visibility_NOSE | X_LEFT_EYE_INNER | Y_LEFT_EYE_INNER | Z_LEFT_EYE_INNER | visibility_LEFT_EYE_INNER | X_LEFT_EYE | ... | Z_RIGHT_HEEL | visibility_RIGHT_HEEL | X_LEFT_FOOT_INDEX | Y_LEFT_FOOT_INDEX | Z_LEFT_FOOT_INDEX | visibility_LEFT_FOOT_INDEX | X_RIGHT_FOOT_INDEX | Y_RIGHT_FOOT_INDEX | Z_RIGHT_FOOT_INDEX | visibility_RIGHT_FOOT_INDEX | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | -0.009803 | -0.477682 | -0.441003 | 0.999779 | -0.004984 | -0.519970 | -0.427440 | 0.999290 | -0.004659 | ... | -0.078578 | 0.197741 | 0.234344 | 0.460095 | -0.062056 | 0.153068 | 0.127708 | 0.496289 | -0.161851 | 0.140401 |
| 1 | 40.0 | 0.004467 | -0.492178 | -0.431907 | 0.999753 | 0.006054 | -0.534498 | -0.417090 | 0.999249 | 0.006306 | ... | -0.085547 | 0.188496 | 0.235454 | 0.463071 | -0.116253 | 0.150826 | 0.131777 | 0.518380 | -0.167714 | 0.132213 |
| 2 | 80.0 | 0.011162 | -0.502053 | -0.423897 | 0.999736 | 0.012345 | -0.544169 | -0.408166 | 0.999221 | 0.012566 | ... | -0.085490 | 0.180904 | 0.235641 | 0.445308 | -0.122139 | 0.148922 | 0.138256 | 0.511499 | -0.166221 | 0.124766 |
| 3 | 120.0 | 0.009747 | -0.505214 | -0.423979 | 0.999733 | 0.011700 | -0.547160 | -0.408186 | 0.999224 | 0.011940 | ... | -0.142238 | 0.175980 | 0.230127 | 0.473694 | -0.177275 | 0.148191 | 0.121558 | 0.540101 | -0.232836 | 0.119558 |
| 4 | 160.0 | 0.008992 | -0.506831 | -0.423051 | 0.999728 | 0.010939 | -0.548991 | -0.406720 | 0.999230 | 0.011184 | ... | -0.151729 | 0.171626 | 0.229687 | 0.469062 | -0.198221 | 0.146560 | 0.121206 | 0.535221 | -0.246488 | 0.114260 |
5 rows × 133 columns
MThand = pd.read_csv(processed_data_fol + select_example + "_hands.csv")
MThand.head()
| time | X_LEFT_WRIST | Y_LEFT_WRIST | Z_LEFT_WRIST | X_LEFT_THUMB_CMC | Y_LEFT_THUMB_CMC | Z_LEFT_THUMB_CMC | X_LEFT_THUMB_MCP | Y_LEFT_THUMB_MCP | Z_LEFT_THUMB_MCP | ... | Z_RIGHT_PINKY_FINGER_MCP | X_RIGHT_PINKY_FINGER_PIP | Y_RIGHT_PINKY_FINGER_PIP | Z_RIGHT_PINKY_FINGER_PIP | X_RIGHT_PINKY_FINGER_DIP | Y_RIGHT_PINKY_FINGER_DIP | Z_RIGHT_PINKY_FINGER_DIP | X_RIGHT_PINKY_FINGER_TIP | Y_RIGHT_PINKY_FINGER_TIP | Z_RIGHT_PINKY_FINGER_TIP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.551751 | 0.847832 | -1.384879e-07 | 0.521275 | 0.856829 | -0.000383 | 0.498019 | 0.876869 | -0.005994 | ... | -0.049297 | 0.646602 | 0.976922 | -0.058607 | 0.639863 | 1.003221 | -0.059681 | 0.634477 | 1.024017 | -0.060027 |
| 1 | 40.0 | 0.551462 | 0.852911 | -7.251678e-08 | 0.523370 | 0.857706 | -0.001900 | 0.499798 | 0.875213 | -0.007606 | ... | -0.051304 | 0.646652 | 0.975250 | -0.061600 | 0.639805 | 1.000674 | -0.063055 | 0.634744 | 1.020851 | -0.063710 |
| 2 | 80.0 | 0.549100 | 0.863097 | -8.652839e-08 | 0.522225 | 0.866193 | -0.001693 | 0.499406 | 0.880324 | -0.007805 | ... | -0.052481 | 0.645833 | 0.976005 | -0.062507 | 0.638911 | 1.000885 | -0.063160 | 0.634186 | 1.019804 | -0.063127 |
| 3 | 120.0 | 0.575792 | 0.857033 | -2.867823e-07 | 0.531686 | 0.860352 | 0.002534 | 0.501139 | 0.875207 | -0.003595 | ... | -0.056763 | 0.646199 | 0.976452 | -0.067439 | 0.638868 | 1.001767 | -0.068186 | 0.633784 | 1.020989 | -0.068303 |
| 4 | 160.0 | 0.570742 | 0.871133 | -2.013690e-07 | 0.534970 | 0.870461 | -0.002351 | 0.505394 | 0.880819 | -0.010308 | ... | -0.053326 | 0.646264 | 0.976998 | -0.064041 | 0.638814 | 1.002117 | -0.065170 | 0.633672 | 1.021207 | -0.065505 |
5 rows × 127 columns
So below we identify some relevant markers that we want to keep at hand. Below we show the mediapipe head face mesh, where we identified some markers from that we will use the x,y,z data from.
Could you add some keypoints that we might need to calculate mouth opening and mouth widening? You can look up the mediapipe image and get the points and prepare a new variable. Maybe you have some other points in mind too? Also do a quick check if your new points are indeed in the dataset.
### Setting up some relevant kinematic markers
# Load in the facial landmarks data
MTface = pd.read_csv(processed_data_fol + select_example + "_face.csv")
MTbody = pd.read_csv(processed_data_fol + select_example + "_body.csv")
# left eye and eye points
left_inner_eye = [['X_133', 'Y_133', 'Z_133'], ['X_33', 'Y_33', 'Z_33']]
left_brows = [['X_225', 'Y_225', 'Z_225'], ['X_224', 'Y_224', 'Z_224'],['X_223', 'Y_223', 'Z_223'], ['X_222', 'Y_222', 'Z_222'], ['X_221', 'Y_221', 'Z_221']] # left eyebrow points
# right eye and eye points
right_inner_eye = [['X_362', 'Y_362', 'Z_362'], ['X_263', 'Y_263', 'Z_263']]
right_brows = [['X_225', 'Y_225', 'Z_225'], ['X_224', 'Y_224', 'Z_224'],['X_223', 'Y_223', 'Z_223'], ['X_225', 'Y_225', 'Z_225'], ['X_221', 'Y_221', 'Z_221']] # right eyebrow points
nosemarkers = [['X_1', 'Y_1', 'Z_1']]
# we also need some normalization points for the brows
normalizationpoints = [['X_6','Y_6', 'Z_6'], ['X_168', 'Y_168', 'Z_168']]
# to make sure that all landmarks are indeed in the MTface lets do a check
all_selected_eye_landmarks = left_inner_eye + left_brows + right_inner_eye + right_brows + nosemarkers
for landmark in all_selected_eye_landmarks:
if landmark[0] not in MTface.columns:
print('landmark ' + landmark[0] + ' not in the MTface columns')
# we also need a center point for centering the face points
body_shoulders = [['X_LEFT_SHOULDER', 'Y_LEFT_SHOULDER', 'Z_LEFT_SHOULDER'], ['X_RIGHT_SHOULDER', 'Y_RIGHT_SHOULDER', 'Z_RIGHT_SHOULDER']]
# hand markers
body_hands = [['X_LEFT_PINKY', 'Y_LEFT_PINKY', 'Z_LEFT_PINKY'],
['X_LEFT_INDEX', 'Y_LEFT_INDEX', 'Z_LEFT_INDEX'],
['X_LEFT_THUMB', 'Y_LEFT_THUMB', 'Z_LEFT_THUMB'],
['X_LEFT_WRIST', 'Y_LEFT_WRIST', 'Z_LEFT_WRIST'],
['X_RIGHT_PINKY', 'Y_RIGHT_PINKY', 'Z_RIGHT_PINKY'],
['X_RIGHT_INDEX', 'Y_RIGHT_INDEX', 'Z_RIGHT_INDEX'],
['X_RIGHT_THUMB', 'Y_RIGHT_THUMB', 'Z_RIGHT_THUMB'],
['X_RIGHT_WRIST', 'Y_RIGHT_WRIST', 'Z_RIGHT_WRIST']]
# to make sure that all landmarks are indeed in the MTbody lets do a check
all_selected_body_landmarks = body_shoulders + body_hands
for landmark in all_selected_body_landmarks:
if landmark[0] not in MTbody.columns:
print('landmark ' + landmark[0] + ' not in the MTbody columns')
# lets also plot a frame of the facemesh and mark the points that we now have selected
# take a row of the data
frame = 1
MTface_frame = MTface.iloc[frame]
# remove all Z 0 to 477
xcols = [col for col in MTface_frame.index if 'X_' in col]
ycols = [col for col in MTface_frame.index if 'Y_' in col]
# remove all rotations
MTface_frame = MTface_frame[MTface_frame.index.str.contains('rotation_')==False]
# plot the 2d face
plt.figure(figsize=(7,7))
plt.scatter(MTface_frame[xcols ], MTface_frame[ycols]*-1)
# mark all_selected_eye_landmarks
for landmark in all_selected_eye_landmarks:
plt.scatter(MTface_frame[landmark[0]], MTface_frame[landmark[1]]*-1, c='r')
plt.show()
Below we plot the head movement from the example video above. We plot all rotations over time. We also apply some smoothing filtering to reduce noise-related jitter. We use a savitsky golai filter for this, with a span of x (determining how many adjecent points are taken into account to determint current point value, i.e., increased values increas smoothnes), and a order of y (which relates to how aggressive the filter is to exclude high frequency jitter, i.e., increased values increas smoothnes), such that savgol_filter(timeseries, x, y).
As an excercise, try to comment out the smoothing, or to adjust the values of the smoothing to assess differences in the timeseries curve.
Winter, A. (2009). Biomechanics and Motor Control of Human Movement (Chapter 2). Wiley & Sons..pdf)
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt
MTface = pd.read_csv(processed_data_fol + select_example + "_face.csv")
MTbody = pd.read_csv(processed_data_fol + select_example + "_body.csv")
# lets just show the head pitch
xrot = MTface['face_rotation_X']
yrot = MTface['face_rotation_Y']
zrot = MTface['face_rotation_Z']
# smooth the pitch
xrot = savgol_filter(xrot, 11, 3) # you can change the span and order here, and you see that the smoothness of the curve will change accordingly
yrot = savgol_filter(yrot, 11, 3)
zrot = savgol_filter(zrot, 11, 3)
# now we can plot the nose vertical position
# plot in one figure
plt.figure(figsize=(10,5))
plt.plot(xrot, label = 'pitch')
plt.plot(yrot, label = 'yaw')
plt.plot(zrot, label = 'roll')
plt.xlabel('frame')
plt.ylabel('rotation in degrees')
plt.title('head rotations')
plt.legend()
plt.show()
So from the time series we can extract features that are indicative of the dynamics of the timevarying behavior. In head movements what could be of interest is how many turns or nods, and how deep the nods and turns are overall or maximally. So lets start with the number of peaks, and the prominences of the peaks ("how the peak stands out relative to the surrounding of the signal). We use a python package called scipy for this (https://docs.scipy.org/doc/scipy-1.13.1/reference/generated/scipy.signal.peak_prominences.html)
As an excercise, try to think about another measure that we can extract from the rotation data. Think for example about rhythmicity, the tempo in which the peaks alternate, or symmetry of the nods?
# store the peaks and and throughs of the head pitch, and plot it on the figure
from scipy.signal import find_peaks
from scipy.signal import find_peaks, peak_prominences
# set a peak function for head rotations, with 3 degress min threshold and 4 frames distance (100 ms, if 25frames per second)
def get_peak_head_data(rot, threshpeak=5, peakdist = 4, prominence=2):
"""This function takes a rotation time series and returns the peaks and troughs of the rotation time series."""
peaks, _ = find_peaks(rot, height=threshpeak, distance=peakdist, prominence = prominence) # Find peaks above the threshold
troughs, _ = find_peaks(rot * -1, height=threshpeak*-1, distance=peakdist, prominence = prominence) # Find troughs by inverting the series
# Initialize default values
peak_prom = trough_prom = np.array([])
numpeaks = avgprom = maxprom = np.nan
if len(peaks) > 0 and len(troughs) > 0:
# Get the prominences of the peaks and troughs
peak_prom = peak_prominences(rot, peaks)[0]
trough_prom = peak_prominences(rot * -1, troughs)[0]
# Calculate summary statistics
numpeaks = len(peaks) + len(troughs)
avgprom = np.mean(peak_prom) + np.mean(trough_prom)
maxprom = np.max(peak_prom) + np.max(trough_prom)
return peaks, peak_prom, troughs, trough_prom, numpeaks, avgprom, maxprom
#apply the function
peaks, peak_prom, troughs, trough_prom, numpeaks, avgprom, maxprom = get_peak_head_data(xrot)
print("peaks: " + str(peaks))
# do some plotting with the peaks overlay
plt.figure(figsize=(5,5))
plt.plot(xrot, label='pitch')
plt.plot(peaks, xrot[peaks], ".", label='peaks', markersize=10)
plt.plot(troughs, xrot[troughs], "x", label='troughs', markersize=10)
# Plot prominences for peaks
# note that zip is used to iterate over two lists at the same time
for peak, prom in zip(peaks, peak_prom):
plt.vlines(x=peak, ymin=xrot[peak] - prom, ymax=xrot[peak], color='C1', linestyle='--', label='peak prominences' if peak == peaks[0] else "")
# Plot prominences for troughs
for trough, prom in zip(troughs, trough_prom):
plt.vlines(x=trough, ymin=xrot[trough], ymax=xrot[trough] + prom, color='C2', linestyle='--', label='trough prominences' if trough == troughs[0] else "")
# also plot the number of peaks and the average and max prominence in an upper left panel
plt.text(-3, 15, 'numpeaks: ' + str(round(numpeaks, 2)) + '\navgprom: ' + str(round(avgprom, 2)) + '\nmaxprom: ' + str(round(maxprom, 2)), fontsize=12)
plt.xlabel('frame')
plt.ylabel('rotation in degrees')
plt.title('head rotations')
plt.legend()
plt.show()
peaks: [43 64]
We can do a similar thing for the hand movement. We will work with the body timeseries which has the index finger too. Lets first plot the raw data of the movement. To get a sense of what this looks like for example 2, where there is some hand movement. Note that usually movement researchers will use x,y,z to indicate horizontal, depth, and vertical dimensions. Mediapipe deviates from this and puts y as vertical dimension. Bit confusing, but well work with it for now (also to learn that you always need to be careful with interpreting).
# Lets also extract the kinematics of the hands
MTbody = pd.read_csv(processed_data_fol + select_example2 + "_body.csv")
# lets smooth the data for the index fingers in one line
coln = ['X_LEFT_INDEX', 'Y_LEFT_INDEX', 'Z_LEFT_INDEX', 'X_RIGHT_INDEX', 'Y_RIGHT_INDEX', 'Z_RIGHT_INDEX']
MTbody[coln] = MTbody[coln].apply(lambda x: savgol_filter(x, 7, 3))
# lets center the index finger by the overall mean
MTbody['X_LEFT_INDEX'] = MTbody['X_LEFT_INDEX'] - MTbody['X_LEFT_INDEX'].mean()
MTbody['Y_LEFT_INDEX'] = (MTbody['Y_LEFT_INDEX'] - MTbody['Y_LEFT_INDEX'].mean())*-1 # note here that i flip the axis so that up is positive
MTbody['Z_LEFT_INDEX'] = MTbody['Z_LEFT_INDEX'] - MTbody['Z_LEFT_INDEX'].mean()
MTbody['X_RIGHT_INDEX'] = MTbody['X_RIGHT_INDEX'] - MTbody['X_RIGHT_INDEX'].mean()
MTbody['Y_RIGHT_INDEX'] = (MTbody['Y_RIGHT_INDEX'] - MTbody['Y_RIGHT_INDEX'].mean())*-1 # note here that i flip the axis so that up is positive
MTbody['Z_RIGHT_INDEX'] = MTbody['Z_RIGHT_INDEX'] - MTbody['Z_RIGHT_INDEX'].mean()
# lets plot the hand kinematics
plt.figure(figsize=(10,5))
plt.plot(MTbody['X_LEFT_INDEX'], label = 'X_LEFT_INDEX')
plt.plot(MTbody['Y_LEFT_INDEX'], label = 'Y_LEFT_INDEX')
plt.plot(MTbody['Z_LEFT_INDEX'], label = 'Z_LEFT_INDEX')
plt.plot(MTbody['X_RIGHT_INDEX'], label = 'X_RIGHT_INDEX')
plt.plot(MTbody['Y_RIGHT_INDEX'], label = 'Y_RIGHT_INDEX')
plt.plot(MTbody['Z_RIGHT_INDEX'], label = 'Z_RIGHT_INDEX')
plt.xlabel('frame')
plt.ylabel('position in meters')
plt.title('hand kinematics')
plt.legend()
plt.show()
We will start with the speed. Below is a calculation of this. We calculate the 2D speed, because the depth dimension is less reliable in mediapipe.
MTbody = pd.read_csv(processed_data_fol + select_example + "_body.csv")
# lets compute the speed based on 2D data
def calcspeed(MTbody, coln, assumed_framerate = 30):
"""This function calculates the speed of the hands in cm/s"""
# lets smooth the data for the index fingers in one line
# smooth using lambda
MTbody[coln] = MTbody[coln].apply(lambda x: savgol_filter(x, 7, 3))
handspeed = np.sqrt((MTbody[coln[0]].diff()**2) + (MTbody[coln[1]].diff()**2)) # were note using z dimension
handspeed = handspeed*assumed_framerate # per second
handspeed = handspeed*100 # to cm
# set speed to zero if not visible using visibility
visible= MTbody['visibility_' + coln[0].split('_')[1]+'_'+coln[0].split('_')[2]] < .85
handspeed[visible] = 0
#set NaN to 0
handspeed = handspeed.fillna(0)
# smooth the speed
handspeed = savgol_filter(handspeed, 7, 3)
return handspeed
# get the speed
MTbody['speed_left_index'] = calcspeed(MTbody, coln = ['X_LEFT_INDEX', 'Y_LEFT_INDEX'])
MTbody['speed_right_index'] = calcspeed(MTbody, coln = ['X_RIGHT_INDEX', 'Y_RIGHT_INDEX'])
# plot handspeed
plt.figure(figsize=(10,5))
plt.plot(MTbody['speed_left_index'], label = 'speed_right_index')
plt.show()
We can extract some features based on the peaks of the speed of the hands. We extract, number of peaks, average height of the peak (average velocity peaks) and maximum of the peak.
As an excercise, try to think about another measure that we can extract. Maybe we want to be more specific, and extract peaks from the vertical or horizontal velocity instead (which contains information about movement direction).
# Set up a function that extracts the peaks and troughs of the hand speed
def get_peak_hand_data(ts, dist = 8, thresh = 15, prominence = 5):
"""This function takes the some hand time series and returns the peaks and troughs of the rotation time series"""
peaks, _ = find_peaks(ts, height=thresh, distance=dist, prominence = 5) # lets set the height to 15 cm/s, set space inbetween min 8 frames, and 5 minimal for
# also collect the summary
# number of peaks, throughs and peaks
numpeaks = len(peaks)
# average peak veleocity for movements
average_peak_vel = np.mean(ts[peaks])
# max magnitude peaks
maxpeak = np.max(ts[peaks])
# calculate the average distance between the peaks in frames
return peaks, numpeaks, average_peak_vel, maxpeak
peaks, numpeaks, average_peak_vel, maxpeak = get_peak_hand_data(MTbody['speed_left_index'])
print("peaks: " + str(peaks))
plt.figure(figsize=(5,5))
plt.plot(MTbody['speed_left_index'], label='speed_left_index')
plt.plot(peaks, MTbody['speed_left_index'][peaks], ".", label='peaks', markersize=10)
# add some text
plt.text(10, 0, 'numpeaks: ' + str(round(numpeaks, 2)) + '\navgpeakvel: ' + str(round(average_peak_vel, 2)) + '\nmaxpeak: ' + str(round(maxpeak, 2)), fontsize=12)
plt.xlabel('frame')
plt.ylabel('speed in cm/s')
plt.title('hand speed_left_index')
plt.legend()
plt.show()
peaks: [43 60]
We can further look at the gesture space as a form of prominence. What is the space that is used to gesture?
# we also add some information about the max hand height and max horizontal space (gesture space)
def gesturespace(MT):
"""This function calculates the maximum height and width of the hand gestures"""
# set the position to the overall mean if visibility is below 0.85
MT['X_LEFT_INDEX'][MT['visibility_LEFT_INDEX'] < 0.85] = MT['X_LEFT_INDEX'].mean()
MT['Y_LEFT_INDEX'][MT['visibility_LEFT_INDEX'] < 0.85] = MT['Y_LEFT_INDEX'].mean()
MT['X_RIGHT_INDEX'][MT['visibility_RIGHT_INDEX'] < 0.85] = MT['X_RIGHT_INDEX'].mean()
MT['Y_RIGHT_INDEX'][MT['visibility_RIGHT_INDEX'] < 0.85] = MT['Y_RIGHT_INDEX'].mean()
# get the max and min of the left and right hand
minhorizontalleft = np.min(MT['X_LEFT_INDEX'])
maxhorizontalleft = np.max(MT['X_LEFT_INDEX'])
minverticalleft = np.min(MT['Y_LEFT_INDEX'])
maxverticalleft = np.max(MT['Y_LEFT_INDEX'])
minhorizontalright = np.min(MT['X_RIGHT_INDEX'])
maxhorizontalright = np.max(MT['X_RIGHT_INDEX'])
minverticalright = np.min(MT['Y_RIGHT_INDEX'])
maxverticalright = np.max(MT['Y_RIGHT_INDEX'])
# just take the min and max for both left and right
minhorizontal = np.min([minhorizontalleft, minhorizontalright])
maxhorizontal = np.max([maxhorizontalleft, maxhorizontalright])
minvertical = np.min([minverticalleft, minverticalright])
maxvertical = np.max([maxverticalleft, maxverticalright])
# calculate the space as the area of the rectangle
heightgesture = (maxhorizontal-minhorizontal)
widthgesture = (maxvertical-minvertical)
gspace = heightgesture*widthgesture
return gspace, heightgesture, widthgesture, minhorizontal, maxhorizontal, minvertical, maxvertical
# plot the x, y of both left and right index and show the gesture space reactangle
gspace, heightgesture, widthgesture, minhorizontal, maxhorizontal, minvertical, maxvertical = gesturespace(MTbody)
plt.figure(figsize=(10,5))
plt.plot(MTbody['X_LEFT_INDEX'], MTbody['Y_LEFT_INDEX'], label = 'LEFT_INDEX')
plt.plot(MTbody['X_RIGHT_INDEX'], MTbody['Y_RIGHT_INDEX'], label = 'RIGHT_INDEX')
plt.plot([minhorizontal, minhorizontal], [minvertical, maxvertical], color='black', linestyle='--')
plt.plot([maxhorizontal, maxhorizontal], [minvertical, maxvertical], color='black', linestyle='--')
plt.plot([minhorizontal, maxhorizontal], [minvertical, minvertical], color='black', linestyle='--')
plt.plot([minhorizontal, maxhorizontal], [maxvertical, maxvertical], color='black', linestyle='--')
# add a label for the gesture space
plt.text(minhorizontal+0.005, minvertical+0.005, 'gesture space: ' + str(round(gspace, 2)) + '\nheight: ' + str(round(heightgesture, 2)) + '\nwidth: ' + str(round(widthgesture, 2)), fontsize=12)
plt.xlabel('X position in meters')
plt.ylabel('Y position in meters')
plt.title('hand kinematics')
plt.legend()
plt.show()
plt.close()
It is very tempting to now go ahead and extract all these features for each trial and do you statistical analysis. But a good inbetween step is sanity checking: do my time series data and my extracted measures match up with the actual audiovisual data? Do they make sense and do they indeed capture something of the dynamics your interested in? Or should I use other parameters for smoothing or select different keypoints, if not devise completely different measures based on what we see.
# lets make a plotting function that combines the above plots
def plot_multiple_speedpeaks_gesturespace(MTface, MTbody, current_frame_counter=0):
# supress plotting
plt.ioff()
# make 4 plots (left upper: head rotations plus peaks, right upper: gesture space, left lower: hand speed, right lower: hand speed)
fig, axs = plt.subplots(2, 2, figsize=(10,10))
# plot the head rotations
xrot = MTface['face_rotation_X']
yrot = MTface['face_rotation_Y']
zrot = MTface['face_rotation_Z']
axs[0, 0].plot(xrot, label = 'pitch')
axs[0, 0].plot(yrot, label = 'yaw')
axs[0, 0].plot(zrot, label = 'roll')
axs[0, 0].set_xlabel('frame')
axs[0, 0].set_ylabel('rotation in degrees')
axs[0, 0].set_title('head rotations')
# set x and y max to highest range
axs[0, 0].set_ylim(-30, 30) # set to 30 degree
# add labels
axs[0, 0].legend()
# get the peaks for pitch
peaks, peak_prom, troughs, trough_prom, numpeaks, avgprom, maxprom = get_peak_head_data(xrot)
if not np.isnan(numpeaks):
axs[0, 0].plot(peaks, xrot[peaks], ".", label='peaks', markersize=10)
axs[0, 0].plot(troughs, xrot[troughs], ".", label='peaks', markersize=10)
for peak, prom in zip(peaks, peak_prom):
axs[0, 0].vlines(x=peak, ymin=xrot[peak] - prom, ymax=xrot[peak], color='C1', linestyle='--', label='peak prominences' if peak == peaks[0] else "")
for trough, prom in zip(troughs, trough_prom):
axs[0, 0].vlines(x=trough, ymin=xrot[trough], ymax=xrot[trough] + prom, color='C2', linestyle='--', label='trough prominences' if trough == troughs[0] else "")
axs[0, 0].text(-3, -30, 'maxprom pitch: ' + str(round(maxprom, 2)), fontsize=12)
# get the peaks for yaw
peaks, peak_prom, troughs, trough_prom, numpeaks, avgprom, maxprom = get_peak_head_data(yrot)
if not np.isnan(numpeaks):
axs[0, 0].plot(peaks, yrot[peaks], ".", label='peaks', markersize=10)
axs[0, 0].plot(troughs, xrot[troughs], ".", label='peaks', markersize=10)
for peak, prom in zip(peaks, peak_prom):
axs[0, 0].vlines(x=peak, ymin=yrot[peak] - prom, ymax=yrot[peak], color='C1', linestyle='--', label='peak prominences' if peak == peaks[0] else "")
for trough, prom in zip(troughs, trough_prom):
axs[0, 0].vlines(x=trough, ymin=yrot[trough], ymax=yrot[trough] + prom, color='C2', linestyle='--', label='trough prominences' if trough == troughs[0] else "")
axs[0, 0].text(-3, -27.5, 'maxprom yaw: ' + str(round(maxprom, 2)), fontsize=12)
# get the peaks for roll
peaks, peak_prom, troughs, trough_prom, numpeaks, avgprom, maxprom = get_peak_head_data(zrot)
if not np.isnan(numpeaks):
axs[0, 0].plot(peaks, zrot[peaks], ".", label='peaks', markersize=10)
axs[0, 0].plot(troughs, xrot[troughs], ".", label='peaks', markersize=10)
for peak, prom in zip(peaks, peak_prom):
axs[0, 0].vlines(x=peak, ymin=zrot[peak] - prom, ymax=zrot[peak], color='C1', linestyle='--', label='peak prominences' if peak == peaks[0] else "")
for trough, prom in zip(troughs, trough_prom):
axs[0, 0].vlines(x=trough, ymin=zrot[trough], ymax=zrot[trough] + prom, color='C2', linestyle='--', label='trough prominences' if trough == troughs[0] else "")
# add text with only the max prominences for each rotation x, y, z
axs[0, 0].text(-3, -25, 'maxprom roll: ' + str(round(maxprom, 2)), fontsize=12)
# plot the gesture space
gspace, heightgesture, widthgesture, minhorizontal, maxhorizontal, minvertical, maxvertical = gesturespace(MTbody)
axs[0, 1].plot(MTbody['X_LEFT_INDEX'], MTbody['Y_LEFT_INDEX'], label = 'LEFT_INDEX')
axs[0, 1].plot(MTbody['X_RIGHT_INDEX'], MTbody['Y_RIGHT_INDEX'], label = 'RIGHT_INDEX')
axs[0, 1].plot([minhorizontal, minhorizontal], [minvertical, maxvertical], color='black', linestyle='--')
axs[0, 1].plot([maxhorizontal, maxhorizontal], [minvertical, maxvertical], color='black', linestyle='--')
axs[0, 1].plot([minhorizontal, maxhorizontal], [minvertical, minvertical], color='black', linestyle='--')
axs[0, 1].plot([minhorizontal, maxhorizontal], [maxvertical, maxvertical], color='black', linestyle='--')
axs[0, 0].set_xlabel('meters')
axs[0, 0].set_xlabel('meters')
axs[0, 1].set_title('index finger positions')
axs[0, 1].legend()
# set x and y max to highest range
axs[0, 1].set_xlim(-0.75, 0.75) # set to 1.5 meter
axs[0, 1].set_ylim(-0.75, 0.75) # set to 1.5 meter
# add a label for the gesture space
axs[0, 1].text(-0.75, -0.75, 'gesture space: ' + str(round(gspace, 2)) + '\nheight: ' + str(round(heightgesture, 2)) + '\nwidth: ' + str(round(widthgesture, 2)), fontsize=12)
# plot the hand speed
axs[1, 0].plot(MTbody['speed_left_index'], label='speed_left_index')
axs[1, 0].plot(MTbody['speed_right_index'], label='speed_right_index')
axs[1, 0].set_xlabel('frame')
axs[1, 0].set_ylabel('speed in cm/s')
axs[1, 0].set_title('hand speed')
# set max 250 cm/s
axs[1, 0].set_ylim(0, 250)
axs[1, 0].legend()
# get the peaks for left hand speed
peaks, numpeaks, average_peak_vel, maxpeak = get_peak_hand_data(MTbody['speed_left_index'])
if not np.isnan(numpeaks):
axs[1, 0].plot(peaks, MTbody['speed_left_index'][peaks], ".", label='peaks', markersize=10)
# add some text for maxspeed only
axs[1, 0].text(0, 230, 'maxpeak left: ' + str(round(maxpeak, 2)), fontsize=12)
# get the peaks for right hand speed
peaks, numpeaks, average_peak_vel, maxpeak = get_peak_hand_data(MTbody['speed_right_index'])
if not np.isnan(numpeaks):
axs[1, 0].plot(peaks, MTbody['speed_right_index'][peaks], ".", label='peaks', markersize=10)
# add some text for maxspeed only
axs[1, 0].text(0, 240, 'maxpeak right: ' + str(round(maxpeak, 2)), fontsize=12)
### frame counte inications
# for the right lower plot lets now put a placeholde image of envisionbox
axs[1, 1].imshow(plt.imread('./images/envision_banner.png'))
# add a vertical line to the speed and rotation plots at framecounter
axs[0, 0].axvline(x=current_frame_counter, color='black', linestyle='--')
axs[1, 0].axvline(x=current_frame_counter, color='black', linestyle='--')
# now add an arrow to the current x,y position at frame counter
currentlocleftindex = [MTbody['X_LEFT_INDEX'][current_frame_counter], MTbody['Y_LEFT_INDEX'][current_frame_counter]]
currentlocrightindex = [MTbody['X_RIGHT_INDEX'][current_frame_counter], MTbody['Y_RIGHT_INDEX'][current_frame_counter]]
axs[0, 1].plot(currentlocleftindex[0], currentlocleftindex[1], 'o', color='black')
axs[0, 1].plot(currentlocrightindex[0], currentlocrightindex[1], '*', color='black')
plt.close()
return fig
#### lets do some testing of this function
# lets load in in the MTbody and MTface again
MTface = pd.read_csv(processed_data_fol + select_example + "_face.csv")
MTbody = pd.read_csv(processed_data_fol + select_example + "_body.csv")
# smooth the data for the face (rotations) and body (index)
colnamesf = ['face_rotation_X', 'face_rotation_Y', 'face_rotation_Z']
MTface[colnamesf] = MTface[colnamesf].apply(lambda x: savgol_filter(x, 7, 3))
# center the face data using lambda
MTface[colnamesf] = MTface[colnamesf].apply(lambda x: x - x.mean())
colnamesb = ['X_LEFT_INDEX', 'Y_LEFT_INDEX', 'Z_LEFT_INDEX', 'X_RIGHT_INDEX', 'Y_RIGHT_INDEX', 'Z_RIGHT_INDEX']
# flip the y axis
MTbody['Y_LEFT_INDEX'] = MTbody['Y_LEFT_INDEX']*-1
MTbody['Y_RIGHT_INDEX'] = MTbody['Y_RIGHT_INDEX']*-1
# smooth the body data
MTbody[colnamesb] = MTbody[colnamesb].apply(lambda x: savgol_filter(x, 17, 3))
# add speed
MTbody['speed_left_index'] = calcspeed(MTbody, coln = ['X_LEFT_INDEX', 'Y_LEFT_INDEX', 'Z_LEFT_INDEX'])
MTbody['speed_right_index'] = calcspeed(MTbody, coln = ['X_RIGHT_INDEX', 'Y_RIGHT_INDEX', 'Z_RIGHT_INDEX'])
# test function
# print current working drive
plotbig = plot_multiple_speedpeaks_gesturespace(MTface, MTbody)
plotbig
The following code takes quite some time, as you need to render a plot every frame (!). We first set the pre-prepocessing that we might want to change.
# preprocessing settings
def preprocessing(MTface, MTbody):
"""This function preprocesses the data for the face and body"""
# smooth the face data
colnamesf = ['face_rotation_X', 'face_rotation_Y', 'face_rotation_Z']
MTface[colnamesf] = MTface[colnamesf].apply(lambda x: savgol_filter(x, 7, 3))
MTface[colnamesf] = MTface[colnamesf].apply(lambda x: x - x.mean())
# smooth the body data
colnamesb = ['X_LEFT_INDEX', 'Y_LEFT_INDEX', 'Z_LEFT_INDEX', 'X_RIGHT_INDEX', 'Y_RIGHT_INDEX', 'Z_RIGHT_INDEX']
MTbody[colnamesb] = MTbody[colnamesb].apply(lambda x: savgol_filter(x, 7, 3))
# flip the y axis
MTbody['Y_LEFT_INDEX'] = MTbody['Y_LEFT_INDEX']*-1
MTbody['Y_RIGHT_INDEX'] = MTbody['Y_RIGHT_INDEX']*-1
# calculate the speed
MTbody['speed_left_index'] = calcspeed(MTbody, coln = ['X_LEFT_INDEX', 'Y_LEFT_INDEX'])
MTbody['speed_right_index'] = calcspeed(MTbody, coln = ['X_RIGHT_INDEX', 'Y_RIGHT_INDEX'])
return MTface, MTbody
# lets make a video where this plot is embedded on the left upper corner as a small plot with the vertical line moving
# we loop through each frame of the video using opencv, and add the plot to the video
# we also make a function as we are going to add this plot in a video to assess the velocity of the nose next to the video
# lets just load some packages again that we need here
import cv2
import numpy as np
from moviepy.editor import VideoFileClip
from IPython.display import Video
import matplotlib.pyplot as plt
import tempfile
from moviepy.editor import VideoFileClip
tempfolder = tempfile.mkdtemp()
# loop through all the videos
for video in all_videos:
print('working on' + video)
# Load in the video
cap = cv2.VideoCapture(video)
frametotal = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) #supposed framecount
# get the frame width and height and make it bigger
frame_width = int(cap.get(3))*3
frame_height = int(cap.get(4))*3
# define the codec and create a VideoWriter object
# replace backward slash with forward
newvideoname = video.replace('\\', '/')
newvideoname = newvideoname.split('/')[-1]
print('processing video: ' + newvideoname)
out = cv2.VideoWriter('../augmented_videos/'+newvideoname, cv2.VideoWriter_fourcc('M','J','P','G'), 25, (frame_width*2,frame_height))
MTbody = pd.read_csv(processed_data_fol + newvideoname.split('.')[0] + "_body.csv")
MTface = pd.read_csv(processed_data_fol + newvideoname.split('.')[0] + "_face.csv")
# preprocess the data
MTface, MTbody = preprocessing(MTface, MTbody)
# loop through the video and add the plot to the video on left upper corner in small inset
frame_number = 0
# while opened
while(cap.isOpened()):
ret, frame = cap.read()
if ret == True:
frame = cv2.resize(frame, (int(frame_width), int(frame_height)))
# generate tempfolder
plotbig = plot_multiple_speedpeaks_gesturespace(MTface, MTbody, current_frame_counter=frame_number)
plotbig.savefig(tempfolder + '/tempfig.png')
# load in plot as image
plotimg = cv2.imread(tempfolder + '/tempfig.png')
# delete the image
os.remove(tempfolder + '/tempfig.png')
# resize the plot image
plotimg = cv2.resize(plotimg, (int(frame_width), int(frame_height)))
# concatenate this frame next to the video frame on the right
frame = np.concatenate((frame, plotimg), axis=1)
out.write(frame)
frame_number += 1
else:
break
cap.release()
out.release()
# Release everything if job is finished so that you can open it in your regular video player
clip = VideoFileClip('../augmented_videos/'+ newvideoname)
clip.write_videofile('../augmented_videos/rerendered/'+ newvideoname)
# show the first video
Video(allrerenderdvideos[0], embed=True, width=900)
Now that we have done some sanity checking. We can move on and extract all the numerical data. You can actually just adde to you meta data new columns with summary features for each trial. See below how our metadata file looks like. We have a trial ID, some condition variable (just a dummy here), and maybe some other variables. We can just add columns to it and save for each trial some new information.
You have added some new things hopefully during the excercises. Could you make sure that they are corerctly saved into you dataframe?
# the flat dataset is a dataset where for each trial you have aggregated information (already)
print(meta.head())
trial_name condition type 0 01_b_P16 dummy1 test 1 02_a_P8 dummy1 test 2 09_a_P7 dummy2 test 3 17_a_P15 dummy2 test 4 21_b_P12 dummy1 test
# now lets loop through the meta data, enrich the columns with kinematic summary data
# lets loop through the meta
for row in range(len(meta)):
# get the trial name
trial_name = meta['trial_name'][row]
# Load in the facial landmarks data
MTface = pd.read_csv(processed_data_fol + trial_name + "_face.csv")
MTbody = pd.read_csv(processed_data_fol + trial_name + "_body.csv")
# preprocess the data
MTface, MTbody = preprocessing(MTface, MTbody)
# get the head pitch peaks
peaks, peak_prom, troughs, trough_prom, numpeaks, avgprom, maxprom = get_peak_head_data(MTface['face_rotation_X'])
# save the data in the meta
meta.at[row, 'head_pitch_peaks'] = numpeaks
meta.at[row, 'head_pitch_avgprom'] = avgprom
meta.at[row, 'head_pitch_maxprom'] = maxprom
# get the head yaw peaks
peaks, peak_prom, troughs, trough_prom, numpeaks, avgprom, maxprom = get_peak_head_data(MTface['face_rotation_Y'])
# save the data in the meta
meta.at[row, 'head_yaw_peaks'] = numpeaks
meta.at[row, 'head_yaw_avgprom'] = avgprom
meta.at[row, 'head_yaw_maxprom'] = maxprom
# get the head roll peaks
peaks, peak_prom, troughs, trough_prom, numpeaks, avgprom, maxprom = get_peak_head_data(MTface['face_rotation_Z'])
# save the data in the meta
meta.at[row, 'head_roll_peaks'] = numpeaks
meta.at[row, 'head_roll_avgprom'] = avgprom
meta.at[row, 'head_roll_maxprom'] = maxprom
# get the hand speed peaks
peaks, numpeaks, average_peak_vel, maxpeak = get_peak_hand_data(MTbody['speed_left_index'])
# save the data in the meta
meta.at[row, 'hand_speed_left_peaks'] = numpeaks
meta.at[row, 'hand_speed_left_avgpeak'] = average_peak_vel
meta.at[row, 'hand_speed_left_maxpeak'] = maxpeak
# get the hand speed peaks
peaks, numpeaks, average_peak_vel, maxpeak = get_peak_hand_data(MTbody['speed_right_index'])
# save the data in the meta
meta.at[row, 'hand_speed_right_peaks'] = numpeaks
meta.at[row, 'hand_speed_right_avgpeak'] = average_peak_vel
meta.at[row, 'hand_speed_right_maxpeak'] = maxpeak
# get the gesture space
gspace, heightgesture, widthgesture, minhorizontal, maxhorizontal, minvertical, maxvertical = gesturespace(MTbody)
# save the data in the meta
meta.at[row, 'gesture_space'] = gspace
meta.at[row, 'gesture_height'] = heightgesture
meta.at[row, 'gesture_width'] = widthgesture
# save the meta as an enriched file
meta.to_csv("../meta/meta_dummy_enriched.csv", index=False)
print('this is what the enriched "flat" dataset now looks like after extracting the kinematics')
meta.head()
this is what the enriched "flat" dataset now looks like after extracting the kinematics
| trial_name | condition | type | head_pitch_peaks | head_pitch_avgprom | head_pitch_maxprom | head_yaw_peaks | head_yaw_avgprom | head_yaw_maxprom | head_roll_peaks | ... | head_roll_maxprom | hand_speed_left_peaks | hand_speed_left_avgpeak | hand_speed_left_maxpeak | hand_speed_right_peaks | hand_speed_right_avgpeak | hand_speed_right_maxpeak | gesture_space | gesture_height | gesture_width | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 01_b_P16 | dummy1 | test | 7.0 | 10.384594 | 20.278464 | 3.0 | 23.631353 | 26.948523 | NaN | ... | NaN | 3.0 | 156.389025 | 210.409314 | 9.0 | 81.488176 | 250.727142 | 0.262754 | 0.437633 | 0.600398 |
| 1 | 02_a_P8 | dummy1 | test | 5.0 | 14.325548 | 23.741412 | NaN | NaN | NaN | NaN | ... | NaN | 4.0 | 94.224422 | 164.126991 | 6.0 | 100.278225 | 198.381721 | 0.138539 | 0.234211 | 0.591511 |
| 2 | 09_a_P7 | dummy2 | test | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | 0.0 | NaN | NaN | 1.0 | 58.473166 | 58.473166 | 0.012066 | 0.034852 | 0.346218 |
| 3 | 17_a_P15 | dummy2 | test | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | 0.0 | NaN | NaN | 2.0 | 123.057173 | 201.840128 | 0.089115 | 0.160349 | 0.555756 |
| 4 | 21_b_P12 | dummy1 | test | 4.0 | 10.993131 | 12.791054 | NaN | NaN | NaN | NaN | ... | NaN | 3.0 | 70.236509 | 113.055866 | 3.0 | 138.095452 | 232.055339 | 0.185006 | 0.342062 | 0.540855 |
5 rows × 21 columns
In our metadata we now have condition as a variable. We also have the kinematic summary data for each trial. So we can start working with this in our plots and statistical analysis. Below is some code for two panel figure with the differences in condition and the nose max amplitude and the nose max velocity.
Could you plot some other things? Could you create another plot, seeing whether some features are correlated?
# lets make a boxplot with the new meta data
import matplotlib.pyplot as plt
import numpy as np
# Load in the enriched meta data
meta = pd.read_csv("../meta/meta_dummy_enriched.csv")
# lets make a boxplot with jitters with condition on the x axis
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
# per condition plot the pitch max prominences pitch, peak velocity, max gesture space, ignoring NaNs, variable y axis
meta.boxplot(column=['head_pitch_maxprom', 'head_pitch_avgprom', 'head_pitch_maxprom' ], by='condition', ax=ax[0], grid=False)
# acis
ax[0].set_xlabel('condition')
ax[0].set_ylabel('degrees per second')
plt.show()
plt.close()
It turns out that the brow raises are not as reliable and they seem to be correlated with head pitch. This is also obtained in one of the following papers: https://arxiv.org/html/2403.10367v1. Thus we need to correct for the head pitch to get reliable brow raises. This was also shown in the following paper Anna Kuznetsova & Vadim Kimmelman: https://arxiv.org/html/2403.10367v1. I will leave this feature for future contributions to this module.
# for the brow movement we are also making a function
def get_brow_movement(MTface):
# create a temporaroy MTface dataframe
MTface_proc = MTface.copy()
# lets first smooth the eye points a little
for marker in all_selected_eye_landmarks:
for i in range(3):
MTface_proc[marker[i]] = savgol_filter(MTface_proc[marker[i]], 26, 3)
# left brow: lets determine the centroid of the left eye
left_eye_mid_x = (MTface[left_inner_eye[0][0]] + MTface[left_inner_eye[1][0]])/2
left_eye_mid_y = (MTface[left_inner_eye[0][1]] + MTface[left_inner_eye[1][1]])/2
left_eye_mid_z = (MTface[left_inner_eye[0][2]] + MTface[left_inner_eye[1][2]])/2
# determine the average euclidean distance between all brow points to this centroid
distances_left = []
for browpoint in left_brows:
# normalize the eye-center-brow distance by the euclidean distance between the nose and the centroid of the left and righ eyes centers
# normalize the distance by the euclidean distance between the nose and the centroid of the left and righ eyes centers
norm_x1 = MTface[normalizationpoints[0][0]]
norm_y1 = MTface[normalizationpoints[0][1]]
norm_z1 = MTface[normalizationpoints[0][2]]
norm_x2 = MTface[normalizationpoints[1][0]]
norm_y2 = MTface[normalizationpoints[1][1]]
norm_z2 = MTface[normalizationpoints[1][2]]
normdist = np.sqrt((norm_x1 - norm_x2)**2 + (norm_y1 - norm_y2)**2 + (norm_z1 - norm_z2)**2)
dist = np.sqrt((MTface[browpoint[0]] - left_eye_mid_x)**2 + (MTface[browpoint[1]] - left_eye_mid_y)**2 + (MTface[browpoint[2]] - left_eye_mid_z)**2)
# distrance from nose to center of eyes
distances_left.append(dist-normdist)
mean_browdistances_left = np.mean(distances_left, axis=0)
# lets smooth another time
mean_browdistances_left = savgol_filter(mean_browdistances_left, 26, 3)
# also center by itself
mean_browdistances_left = mean_browdistances_left - np.mean(mean_browdistances_left)
# do the same for the right brow
right_eye_mid_x = (MTface[right_inner_eye[0][0]] + MTface[right_inner_eye[1][0]])/2
right_eye_mid_y = (MTface[right_inner_eye[0][1]] + MTface[right_inner_eye[1][1]])/2
right_eye_mid_z = (MTface[right_inner_eye[0][2]] + MTface[right_inner_eye[1][2]])/2
# determine the average euclidean distance between all brow points to this centroid
distances_right = []
for browpoint in right_brows:
# normalize the distance by the euclidean distance between the nose and the centroid of the left and righ eyes centers
norm_x1 = MTface[normalizationpoints[0][0]]
norm_y1 = MTface[normalizationpoints[0][1]]
norm_z1 = MTface[normalizationpoints[0][2]]
norm_x2 = MTface[normalizationpoints[1][0]]
norm_y2 = MTface[normalizationpoints[1][1]]
norm_z2 = MTface[normalizationpoints[1][2]]
normdist = np.sqrt((norm_x1 - norm_x2)**2 + (norm_y1 - norm_y2)**2 + (norm_z1 - norm_z2)**2)
# distance from nose to center of eyes
dist = np.sqrt((MTface[browpoint[0]] - right_eye_mid_x)**2 + (MTface[browpoint[1]] - right_eye_mid_y)**2 + (MTface[browpoint[2]] - right_eye_mid_z)**2)
distances_right.append(dist-normdist)
mean_browdistances_right = np.mean(distances_right, axis=0)
# lets smooth another time
mean_browdistances_right = savgol_filter(mean_browdistances_right, 26, 3)
mean_browdistances_right = mean_browdistances_right - np.mean(mean_browdistances_right)
return mean_browdistances_left, mean_browdistances_right